home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Amiga Format CD 24
/
Amiga Format AFCD24 (Feb 1998, Issue 108).iso
/
-seriously_amiga-
/
shareware
/
programming
/
c
/
vbcc
/
doc
/
interface.doc
< prev
next >
Wrap
Text File
|
1998-01-05
|
29KB
|
810 lines
(c) in 1995-97 by Volker Barthelmann
This document is under construction!
This document describes some of the internals of vbcc and tries to explain
what has to be done to write a code generator for vbcc.
However if someone wants to write one, I suggest to contact me first,
so that it can be integrated into the source tree.
You have to create a new directory for the new target named
machines/<target-name> and write the files machine.c, machine.h
and machine.dt. The compiler for this target will be called
vbcc<target-name> and can be built by the statement
"make TARGET=<target-name> bin/vbcc<target-name>".
From now on integer means any of {char, short, int, long} or their
unsigned couterparts. Arithmetic means integer or float or double.
Elementary type means arithmetic or pointer.
If you intend to write a code generator for a machine with multiple
different kinds of pointers you might have some problems.
THE INTERMEDIATE CODE
vbcc will generate intermediate code for every function and pass this code
to the code generator which has to convert it into the desired output.
In the future there may be a code generator generator which reads a machine
description file and generates a code generator from that, but it is not
clear whether this could simplify much without taking penalties in the
generated code.
Anyway this would be a layer on top of the current interface to the code
generator so that the interface described in this document would still be
valid and accessable.
The intermediate code is represented as a doubly linked list of quadruples
(I am calling them ICs from now on) consisting mainly of an operator, two
source operands and a target. They are represented like this:
struct IC{
struct IC *prev;
struct IC *next;
int code;
int typf;
[...]
struct obj q1;
struct obj q2;
struct obj z;
[...]
};
The only members relevant to the code generator are 'prev', 'next', 'code',
'typf', 'q1', 'q2' and 'z'.
'prev' and 'next' are pointers to the previous and next IC.
The first IC has 'prev'==0 and the last one has 'next'==0.
'typf' is the type of the operands of this IC. This can be one of:
#define CHAR 1
#define SHORT 2
#define INT 3
#define LONG 4
#define FLOAT 5
#define DOUBLE 6
#define VOID 7
#define POINTER 8
#define ARRAY 9
#define STRUCT 10
#define UNION 11
#define ENUM 12 /* not relevant for code generator */
#define FUNKT 13
and can be additionally or'ed by
#define UNSIGNED 16
#define CONST 64
#define VOLATILE 128
#define UNCOMPLETE 256
However only UNSIGNED is of real importance for the code generator.
'typf'&NQ yields the type without any qualifiers, 'typf'&NU yields
the type without any qualifiers but UNSIGNED.
'q1', 'q2' and 'z' are the source1 (quelle1 in German), source2 and target
(ziel).
If a result has to be computed, it always will be stored in the object 'z'
and the objects 'q1' and 'q2' usually may not be destroyed during this
operation.
The objects are described by this structure.
struct obj{
int flags;
int reg;
struct Var *v;
struct AddressingMode *am;
union atyps{
zchar vchar;
zchar vuchar;
zshort vshort;
zushort vushort;
zint vint;
zuint vuint;
zlong vlong;
zulong vulong;
zfloat vfloat;
zdouble vdouble;
zpointer vpointer;
}val;
};
'flags' describes what kind the object is. It can be a combination of
#define VAR 1
The object is a variable. The pointer to its struct Var is in 'v'.
'val.vlong' vontains an offset that has to be added to it.
A struct Var looks like:
struct Var{
int storage_class;
[...]
char *identifier;
[...]
zlong offset;
[...]
};
The relevant entries are:
'identifier':
The name of the variable. Usually only of interest for variables
with external-linkage.
'storage_class':
One of:
#define AUTO 1
#define REGISTER 2
#define STATIC 3
#define EXTERN 4
#define TYPEDEF 5 /* not relevant */
If the variable is not assigned to a register (i.e. bit REG
is not set in the flags of the corresponding struct obj) then
the variable can be addressed in the following ways (with
examples of 68k-code):
'storage_class' == AUTO or 'storage_class' == REGISTER:
'offset' contains the offset inside the local-variables section.
The code generator must decide how it's going to handle the
activation record.
If 'offset' < 0 then the variable is a function argument on the
stack. In this case the offset in the parameter-area is
- ('offset' + 'maxalign').
The code generator may have to calculate the actual offset
to a stack- or frame-pointer from the value in 'offset'.
'offset'+'val.vlong'(sp)
Note that 'storage_class' REGISTER is equivalent to AUTO - whether
the variable is actually assigned a register is specified by
the bit REG in the 'flags' of the 'struct obj'.
'storage_class' == EXTERN
The variable can be addressed through its name in 'identifier'.
'val.vlong'+_'identifier'
'storage_class' == STATIC
The variable can be addressed through a numbered label. The
label number is stored in 'offset'.
'val.vlong'+l'offset'
#define KONST 2
The object is a constant. Its value is in the corresponding (to 'typf')
member of 'val'.
#define DREFOBJ 32
The content of the location in memory the object points to is used.
#define REG 64
The object is a register. 'reg' contains its number.
#define VARADR 128
The address of the object is to be used. Only together with static
variables (i.e. 'storage_class' STATIC or EXTERN).
The possible combinations of these flags should be:
0 (no object)
KONST
REG
VAR
VAR|REG
REG|DREFOBJ
VAR|DREFOBJ
VAR|REG|DREFOBJ
VAR|VARADR
Also some other bits which are not relevant to the code generator may be set.
Constants will usually be in 'q2' if possible. One of the sources always is
not constant and the target is always an lvalue.
Unless otherwise specified all operands of an IC are of the type 'typf'
(which may be further restricted by 'code'). However not all objects have
to be used.
This depends on 'code' and is listed below. In most cases (i.e. when not
explicitly stated) 'typf' is an elementary type (i.e. arithmetic or pointer).
'am' can be used to store information on special addressing modes.
This has to be handled by the by the code generator. However 'am' has to be 0
or has to point to a struct AddressingMode that was allocated using malloc()
when the code generator returns.
'val' stores either the value of the object if it is a constant or an offset
if it is a variable.
'code' describes the operation and can be one of:
#define ASSIGN 2
Copy 'q1' to 'z'. 'q2.val.vlong' contains the size of the objects (this is
necessary if it is an array or a struct). 'typf' does not have to be an
elementary type!
The only case where 'typf' == ARRAY should be in automatic initializations.
It is also possible that ('typf'&NQ) == CHAR but the size is != 1. This is
created for an inline memcpy/strcpy where the type is not known.
#define OR 16
#define XOR 17
#define AND 18
Bitwise boolean operations. q1,q2->z.
All operands are integers.
#define LSHIFT 25
#define RSHIFT 26
Bit shifting. q1,q2->z. 'q2' is the number of shifts.
All operands are integers.
#define ADD 27
#define SUB 28
#define MULT 29
#define DIV 30
Standard arithmetic operations. q1,q2->z.
All operands are of arithmetic types (integers or floating point).
#define MOD 31
Modulo (%). q1,q2->z.
All operands are integers.
#define KOMPLEMENT 33
Bitwise complement. q1->z.
All operands are integers.
#define MINUS 38
Unary minus. q1->z.
All operands are of arithmetic types (integers or floating point).
#define ADDRESS 40
Get the address of an object. q1->z.
'z' is always a pointer and 'q1' is always an auto variable.
#define CALL 42
Call the function 'q1'. Currently 'q1' is a function rather than a pointer
to a function. This may change in the future.
'q2.val.vlong' contains the number of bytes pushed on the stack as
function arguments for this call. Those may have to be popped from the
stack after the function returns depending on the calling mechanism.
#define CONVCHAR 50
#define CONVSHORT 51
#define CONVINT 52
#define CONVLONG 53
#define CONVFLOAT 54
#define CONVDOUBLE 55
#define CONVPOINTER 57
#define CONVUCHAR 58
#define CONVUSHORT 59
#define CONVUINT 60
#define CONVULONG 61
Convert one type to another. q1->z.
'z' is always of the type 'typf'. 'q1' is a short in CONVSHORT and an
unsigned long in CONVULONG etc.
Conversions floating point<->pointers do not occur.
#define ALLOCREG 65
From now on the register 'q1.reg' is in use. No code has to be generated
for this, but it is probably necessary to keep track of the registers
in use to know which registers are available for the code generator
at a time and which registers the function trashes.
#define FREEREG 66
From now on the register 'q1.reg' is free.
Also it means that the value currently stored in 'q1.reg' is not used any
more and therefore provides a little bit of data flow information.
Note however that if a FREEREG follows a branch the value of the register
may be used at the target of the branch.
#define COMPARE 77
Compare and set condition codes. q1,q2(->z).
Compare the operands and set the condition code, so that
BEQ, BNE, BLT, BGE, BLE or BGT works.
If 'z.flags' == 0 then the condition codes will be evaluated immediately
after the COMPARE, i.e. the next instruction (except possible FREEREGs)
will be a conditional branch.
However if a target supports several condition code registers and sets
the global variable 'multiple_ccs' to 1 vbcc might use those registers
and perform certain optimizations. Then 'z' may be non-empty and the
condition codes have to be stored in 'z'.
#define TEST 68
Test 'q1' to 0 and set condition codes. q1.
This is equal to COMPARE 'q1',(corresponding constant 0)
but only the condition code for BEQ and BNE has to be set.
#define LABEL 69
Generate a label. 'typf' specifies the number of the label.
#define BEQ 70
#define BNE 71
#define BLT 72
#define BGE 73
#define BLE 74
#define BGT 75
Branch on condition codes. (q1)
'typf' specifies the label where program execution shall continue, if the
condition code is true (otherwise continue with next statement).
The condition codes mean equal, not equal, less than, greater or equal,
less or equal and greater than.
If 'q1' is empty (q1.flags==0) then the codes set by the last COMPARE
or TEST must be evaluated. Otherwise 'q1' contains the condition codes.
On some machines the type of operands of a comparison (e.g unsigned or
signed) is encoded in the branch instructions rather than in the
comparison instructions. In this case the code generator has to keep
track of the type of the last comparison.
#define BRA 76
Branch always. 'typf' specifies the label where program execution
continues.
#define PUSH 78
Push q1 on the stack. q1.
'q2.val.vlong' contains the size of the object and 'q1' does not have to
be an elementary type (see ASSIGN).
This is only used for passing function arguments.
#define ADDI2P 81
Add an integer to a pointer. q1,q2->z.
'q1' and 'z' are always pointers and 'q2' is an integer of type 'typf'.
'z' has to be 'q1' increased by 'q2' bytes.
#define SUBIFP 82
Subtract an Integer from a pointer. q1,q2->z.
'q1' and 'z' are always pointers and 'q2' is an integer of type 'typf'.
'z' has to be 'q1' decreased by 'q2' bytes.
#define SUBPFP 83
Subtract a pointer from a pointer. q1,q2->z.
'q1' and 'q2' is a pointer and 'z' is an integer of type 'typf'.
'z' has to be 'q1' - 'q2' in bytes.
#define GETRETURN 93
Get the return value of the last function call. ->z.
If the return value is in a register this will be in 'q1.reg'. Otherwise
'q1.reg' will be 0.
This follows immediately after a CALL instruction (except possible
FREEREGs).
#define SETRETURN 94
Set the return value of the current function. q1.
If the return value is in a register this will be in 'z.reg'. Otherwise
'z.reg' will be 0.
This is immediately followed by a function exit (i.e. it is the last
IC or followed by an unconditional branch to a label which is the last
IC - always ignoring FREEREGs).
#define MOVEFROMREG 95
Move a register to memory. q1->z.
'q1' is always a register and 'z' an array of size 'regsize[q1.reg]'.
#define MOVETOREG 96
Load a register from memory. q1->z.
'z' is always a register and 'q1' an array of size 'regsize[z.reg]'.
#define NOP 97
Do nothing.
TARGET DATA TYPES
As the compiler should be portable we must not assume anything about
the data types of the host system which is not guaranteed by
ANSI/ISO C. Especially do not assume that the data types of the host
system correspond to the ones of the target system.
Therefore vbcc will provide typedefs which can hold a data type
of the target machine and (as there is no operator overloading in C)
functions to perform arithmetic on these types.
The typedefs for the target's data types are:
zchar type char on the target machine
zuchar type unsigned char on the target machine
zshort ...
zushort
zint
zuint
zlong
zulong
zfloat
zdouble
zpointer a byte pointer on the target machine
These typedefs and arithmetic functions to work on them will be
generated by the program dtgen when compiling vbcc.
It will create the files machines/$(TARGET)/dt.h and dt.c.
These files are generated from the file machines/$(TARGET)/machine.dt
which must describe what representations the code generator needs.
dtgen will then ask for available types on the host system and
choose appropriate ones and/or install emulation functions if available.
machine.dt must look as follows:
Every data type representation gets a symbol (the ones which are
already available can be looked up in datatypes/datatypes.h - new
ones will be added when necessary).
The first 11 lines now must contain the representations for the
following types:
line type
1 signed char
2 unsigned char
3 signed short
4 unsigned short
5 signed int
6 unsigned int
7 signed long
8 unsigned long
9 float
10 double
11 void *
If the code generator can use several representations these can be
added on the same line separated by spaces. E.g. the code generator
for m68k does not care if the integers are stored big-endian or
little-endian on the host system because it only accesses them through
the provided arithmetic functions. It does, however, access floats
and doubles through byte-pointers and therefore requires them to
be stored in big-endian-format.
TARGET ARITHMETIC
Now you have a lot of functions/macros performing operations using the
target machine's arithmetic. You can look them up in dt.h/dt.c.
E.g. zladd() takes two zlongs and returns their sum as zlong. zuladd() does
the same with zulongs, zdadd() with doubles. No functions for smaller types
are needed because you can calculate with the wider types and convert the
results down if needed.
Also there are conversion functions which convert between types of the
target machine. E.g. zl2zc takes a zlong and returns the value converted
to a zchar.
Again look at dt.h/dt.c to see which ones are there.
A few functions for converting between target and host types are also
there, e.g. l2zl takes a long and returns it converted to a zlong.
At last there are functions for comparing target data types. E.g.
zlleq(a,b) returns true if zlong a <= zlong b and false otherwise.
zleqto(a,b) returns true if zlong a == zlong b and false otherwise.
ADDRESSING-MODES
The intermediate code generated by vbcc does not use any
addressing-modes a target might offer. Therefore the code generator
must find a way to combine several statements if it wants to make use
of these modes. E.g. on the m68k the intermediate code
add int #20,a0->a1
move int #10->(a1)
freereg a1
could be translated to
move.l #10,20(a0)
(notice the freereg which is important).
To aid in this there is a pointer to a struct AdressingMode in every
struct obj. A code generator could e.g. do a pass over the intermediate
code, find possible uses for addressing-modes, allocate a struct
AddressingMode and store a pointer in the struct obj effectively
replacing the obj.
If the code generator supports extended addressing-modes you have to think
of a way to represent them and define the structure AddressingMode so that
all Modes can be stored in it. The machine independant part of vbcc will
not use these modes, so your code generator has to find a way to combine
several statements to make use of these modes.
When the code generator is done that pointer in every struct obj must
either be zero or point to a malloc'ed struct AddressingMode which
will be free'd by vbcc.
MACHINE.H
The first statement should be #include "dt.h".
#define MAXR to the number of available registers.
#define MAXGF to the number of command line flags that can be used to
configure the behaviour of the code generator. This must be at least one
even if you do not use any flags.
#define USEQ2ASZ as 0 or 1; if it is set to 0, no ICs where 'q2' == 'z'
will be generated. This is because those ICs might be hard to implement
efficiently on certain CPUs.
#define MINADDI2P to the smallest integer type (i.e. CHAR, SHORT or INT)
that can be added to a pointer. Smaller types will be automatically converted
to MINADDI2P when they are to be added to a pointer.
This may be subsumed by shortcut() in the future.
#define BIGENDIAN as 1 if integers are represented in big endian, i.e. the
most significant byte is at the lowest memory address, the least significant
byte at the highest.
#define LITTLEENDIAN as 1 if integers are represented in little endian, i.e.
the least significant byte is at the lowest memory address, the most
significant byte at the highest.
#define SWITCHSUBS as 1 if switch-statements should be compiled into a
series of SUB/TEST/BEQ instructions rather than COMPARE/BEQ. This may be
useful if the target has a more efficient SUB-instruction (e.g. 68k).
#define INLINEMEMCPY to the largest size in bytes allowed for inline memcpy.
Calls to memcpy/strcpy with a known size smaller than INLINEMEMCPY may be
replaced by a single ASSIGN IC by vbcc.
This may be replaced by a variable of type zlong in the future.
#define ORDERED_PUSH to 1 if you want PUSH-ICs for function arguments to
be generated from left to right instead right to left.
#define HAVE_REGPARMS to 1 if the default function-call-mechanism
uses register parameters. If you use this you also have to define a
struct reg_handle {...}
This is used by the compiler to find out which register it should pass
arguments in it. In machine.c you have to define an initialized
variable
struct reg_handle empty_reg_handle;
which represents the default state and a function
int reg_parm(struct reg_handle *, struct Typ *);
which returns the number of the register the next argument will be
passed in (or 0 if the argument is not passed in a register) and
updates the reg_handle in a way that successive calls to reg_parm()
yield the correct register for every argument.
MACHINE.C
This is the main part of the code generator. The first statement
should be #include "supp.h" which will include all necessary
declarations.
The following variables and functions must be provided by machine.c.
NAME AND COPYRIGHT
The codegenerater must define a zero-terminated character array
containing name and copyright-notice of the code-generator.
COMMANDLINE OPTIONS
You can use code generator specific commandline options.
The number of flags is specified as MAXGF in machine.h.
Insert the names for the flags as char *g_flags_name[MAXGF].
If an option was specified (g_flags[i]&USEDFLAG) is not zero.
In int g_flags[MAXGF] you can also choose how the options are to be
used:
0 The option can only be specified. E.g. if
g_flags_name[2]=="myflag", the commandline may contain
"-myflag" and (g_flags[2]&USEDFLAG)!=0.
VALFLAG The option must be specified with an integer constant, e.g.
"-myflag=1234". This value can be found in g_flags_val[2].l
then.
STRINGFLAG The option must be specified with a string, e.g.
"-myflag=Hallo". The pointer to the string can be found in
g_flags_val[2].p then.
DATA TYPES
The array zlong align[16] must contain the necessary alignments for every
type in bytes. Some of the entries in this array are not actually
used, but align[type&15] must yield the correct alignment for every type.
align[CHAR] must be 1.
zlong maxalign; must be set to an alignment in bytes that is used when
pushing arguments on the stack.
The array zlong sizetab[16] must contain the sizes of every type in bytes.
The array zlong t_min[32] must contain the smallest number for every
integer type (including unsigned ones).
The array zulong t_max[32] must contain the greatest number for every
integer type (including unsigned ones).
As zlong and zulong may be no elementary types on the host machine those
arrays have to be initialized dynamically.
Also note that if you want the code generator to be portable those values
may not be representable as constants by the host architecture and have to
be calculated using the functions for arithmetic on the target's data
types. E.g. the smallest representable value of a 32bit twos-complement
data type is not guaranteed to be valid on every ANSI C implementation.
Also note that you may not use simple operators on the target data types
but you have to use the functions or convert them to an elementary
type of the host machine before (if you know that it is representable
as such).
REGISTER SET
The valid registers are numbered from 1..MAXR.
The array char *regnames[MAXR+1] must contain the names for every register.
zlong regsize[MAXR+1] must contain the size of each register in bytes.
This is used to create storage if registers have to be saved.
int regscratch[MAXR+1] must contain information whether a register is
a scratchregister i.e. may be destroyed during a function call (1 or 0).
vbcc will generate code to save/restore all scratch-registers which are
assigned a value when calling a function. However if the code generator
uses additional scratch-registers it has to take care to save/restore
them.
Also the code generator must save/restore used non-scratch-registers
on function entry/exit.
int regsa[MAXR+1] must contain information whether a register is in use
or not at the beginning of a function (1 or 0).
The compiler will not use any of those registers for register variables
or temporaries.
You _must_ set regsratch[i] = 0 if regsa[i] == 1. If you want it to be
save across function calls the code generator has to take care of this.
You should order the registers so that the ones that should be used
first have the smallest number and it is recommended that registers
which are used to pass return values be assigned the lowest numbers.
Also you may reserve certain registers to the code generator. This may
be reasonable if many ICs cannot be converted without using additional
registers.
FUNCTIONS
The following functions have to be implemented by the code generator:
- int init_cg(void);
This function is called after the commandline arguments are parsed.
It can set up certain internal data, etc. The arrays regarding the
data types and the register set, can be set up at this point rather
than with a static initialization, however the arrays regarding the
commandline options have to be static initialized.
The results of the commandline options are available at this point.
If something goes wrong, 0 has to be returned, otherwise 1.
- void cleanup_cg(FILE *f)
This function is called before the compiler exits. f is the output file
which _must_ be checked against 0 before using.
- int freturn(struct Typ *t);
This function has to return the number of the register return
values of type t are passed in. If the type is not passed in a
register, 0 must be returned.
- int regok(int r, int t, int mode);
Check whether the type t can be stored in register r; return 0, if not.
If t==POINTER and mode==0 the register only has to be able to store the
pointer, but if mode!=0 it has to be able to dereference the pointer.
If t==0 return whether the register can be used to store condition codes.
This is only relevant if multiple_ccs is set to 1.
- int dangerous_IC(struct IC *p)
Check if this IC can raise exceptions or is otherwise dangerous.
Movement of ICs which are dangerous is restricted to preserve the
semantics of the program.
Typical dangerous ICs are divisions or pointer dereferencing. On certain
targets floating point or even signed integer arithmetic can raise
exceptions, too.
- int must_convert(np p, int t)
Check if type in p does not have to be converted to type t. E.g. on
many machines certain types have identical representations (integers
of the same size or pointers and integers of the same size).
WARNING: Arguments of this functions may change in the future!
- int shortcut(int code, int t)
In C no operations are done with chars and shorts because of integral
promotion. However sometimes vbcc might see that an operation could
be performed with the short types yielding the same result.
Before generating such an instruction with short types vbcc will ask
the code generator by calling shortcut() to find out whether it should
do so. Return true iff it is a win to perform the operation code with
type t rather than promoting the operands and using int or so.
- void gen_code(FILE *f, struct IC *p, struct Var *v, zlong offset);
This function has to generate the output for a function to stream f.
v is a pointer to the function which contains the name of the function.
p is a pointer to the first IC, that has to be converted.
offset is the space needed for local variables in bytes.
This function has to take care that only scratchregisters are destroyed
by this function. The array regused contains information about the
registers that have been used by vbcc in this function. However if the
code generator uses additional registers it has to take care of them,
too.
The regs[] and regused[] arrays may be overwritten by gen_code() as well
as parts of the list of ICs. However the list of ICs must still be a
valid list of ICs after gen_code() returned.
- void gen_ds(FILE *f, zlong size, struct Typ *t);
Has to print output that generates size bytes of type t initialized
with proper 0.
t is a pointer to a struct Typ which contains the precise type of
the variable. On machines where every type can be initialized to 0
by setting all bits to zero the type does not matter.
- void gen_align(FILE *f, zlong align);
Has to print output that ensures the following data to be aligned to
align bytes.
- void gen_var_head(FILE *f, struct Var *v);
Has to print the head of a static or external variable v. This includes
the label and necessary informations for external linkage etc.
Typically variables will be generated by a call to gen_align followed
by gen_var_head and (a series of) calls to gen_dc and/or gen_ds.
- void gen_dc(FILE *f, int t, struct const_list *p);
Well..I'm too lazy at the moment to describe that...
Also arguments may still change in the future.
Volker Barthelmann volker@vb.franken.de